Search CORE

5 research outputs found

Homogenous ensemble phonotactic language recognition based on SVM supervector reconstruction

Author: A Stolcke
A Zolnay
B Scholkopf
D Garcia-Romero
D Povey
E Singer
F Zheng
G Heigold
H Li
H Li
H Li
H Xiong
Jia Liu
KC Sim
LF Dharo Enriquez
LJ Rodriguez-Fuentes
M Penagarikano
M Wang
MA Zissman
Michael T Johnson
MP Lewis
N Dehak
N Morgan
P Matejka
P Matejka
P Matejka
P Schwarz
P Vincent
PA Torres-Carrasquillo
PA Torres-Carrasquillo
PA Torres-Carrasquillo
R Collobert
V Hubeika
VW Zue
W-Q Zhang
W-W Liu
W-W Liu
W-W Liu
Wei-Qiang Zhang
Wei-Wei Liu
WM Campbell
WM Campbell
WM Campbell
YK Muthusamy
Z Jancik
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

Author: A Graves
A Graves
A Lozano-Diez
A rahman Mohamed
Alicia Lozano-Diez
CM Bishop
D Martinez
D Martinez
D Reynolds
D Yu
Doroteo T. Toledano
F Gers
F Richardson
F Weninger
FA Gers
FA Gers
G Hinton
H Li
Ian McLoughlin
J Gonzalez-Dominguez
J Gonzalez-Dominguez
J Schmidhuber
Javier Gonzalez-Dominguez
Joaquin Gonzalez-Rodriguez
M Van Segbroeck
N Dehak
N Dehak
P Kenny
PA Torres-Carrasquillo
Ruben Zazo
Y Song
YK Muthusamy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

Biblos-e Archivo